Scalable ad-hoc entity extraction from text collections
نویسندگان
چکیده
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the “ad-hoc” entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.
منابع مشابه
Pipelines for Ad-hoc Large-scale Text Mining
Pipelines for Ad-hoc Large-scale Text Mining Today’s web search and big data analytics applications aim to address information needs (typically given in the form of search queries) ad-hoc on large numbers of texts. In order to directly return relevant information instead of only returning potentially relevant texts, these applications have begun to employ text mining. The term text mining cover...
متن کاملScalable Phrase Mining for Ad-hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...
متن کاملDesign and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks
Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...
متن کاملDesign and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks
Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...
متن کاملInteresting-Phrase Mining for Ad-Hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 1 شماره
صفحات -
تاریخ انتشار 2008